A Distributed Consistent Global Checkpoint Algorithm with a Minimum Number of Checkpoints
نویسنده
چکیده
A distributed coordinated checkpointing algorithm is shown. A consistent global checkpoint is a set of states in which no message is recorded as received in one process and as not yet sent in another process. This algorithm obtains a consistent global checkpoint for any checkpoint initiation by any process. Under Chandy and Lamport’s assumption that one consistent global checkpoint is obtained for a set of concurrent checkpoint initiations, the total number of checkpoints is minimized. This paper then modifies the assumption in order to reduce the number of checkpoints
منابع مشابه
A Distributed First and Last Consistent Global Checkpoint Algorithm
Distributed coordinated checkpointing algorithms are discussed. The first global checkpoint for a checkpoint initiation is a set containing the checkpoint for each process in which any checkpoint before the element is not consistent with the initiation. The last global checkpoint for a checkpoint initiation is a set containing the checkpoint for each process in which any checkpoint after the el...
متن کاملAn optimistic checkpointing and message logging approach for consistent global checkpoint collection in distributed systems
Checkpointing and rollback recovery are widely used techniques for achieving fault-tolerance in distributed systems. In this paper, we present a novel checkpointing algorithm which has the following desirable features: A process can independently initiate consistent global checkpointing by saving its current state, called a tentative checkpoint. Other processes come to know about a consistent g...
متن کاملDirect Dependency-Based Determination of Consistent GlobalCheckpoints
Building consistent global checkpoints that contain a given set of local checkpoints has been usually handled by using transitive dependency tracking. This imply the usage of a vector of integers piggybacked on each message of the computation (the vector size being given by the number of processes). In this paper we address the problem to get consistent global checkpoints including a given subs...
متن کاملA Low Overhead Minimum Process Global Snapshop Collection Algorithm for Mobile Distributed System
Coordinated checkpointing is an effective fault tolerant technique in distributed system as it avoids the domino effect and require minimum storage requirement. Most of the earlier coordinated checkpoint algorithms block their computation during checkpointing and forces minimum-process or non-blocking but forces all nodes to takes checkpoint even though many of them may not be necessary or non-...
متن کاملCheckpoint and Rollback in Asynchronous Distributed Systems
This paper proposes a novel algorithm for taking checkpoints and rolling back the processes for recovery in asynchronous distributed systems. The algorithm has the following properties: (1) Multiple processes can simultaneously initiate the checkpointing. (2) No additional message is transmitted for taking checkpoints. (3) A set of local checkpoints taken by multiple processes denotes a consist...
متن کامل